{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 结构化数组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们要保存这样的数据：\n",
    "\n",
    "|name|age|wgt\n",
    "--|--|--|--\n",
    "0|dan|1|23.1\n",
    "1|ann|0|25.1\n",
    "2|sam|2|8.3\n",
    "\n",
    "希望定义一个一维数组，每个元素有三个属性 `name, age, wgt`，此时我们需要使用结构化数组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义数组 `a`：\n",
    "\n",
    "0|1|2|3\n",
    "-|-|-|-\n",
    "1.0|2.0|3.0|4.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "a = np.array([1.0,2.0,3.0,4.0], np.float32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `view` 方法，将 `a` 对应的内存按照复数来解释："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.+2.j,  3.+4.j], dtype=complex64)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.view(np.complex64)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "0|1|2|3\n",
    "-|-|-|-\n",
    "1.0|2.0|3.0|4.0\n",
    "real|imag|real|imag\n",
    "\n",
    "事实上，我们可以把复数看成一个结构体，第一部分是实部，第二部分是虚部，这样这个数组便可以看成是一个结构化数组。\n",
    "\n",
    "换句话说，我们只需要换种方式解释这段内存，便可以得到结构化数组的效果！\n",
    "\n",
    "0|1|2|3\n",
    "-|-|-|-\n",
    "1.0|2.0|3.0|4.0\n",
    "mass|vol|mass|vol\n",
    "\n",
    "例如，我们可以将第一个浮点数解释为质量，第二个浮点数解释为速度，则这段内存还可以看成是包含两个域（质量和速度）的结构体。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "my_dtype = np.dtype([('mass', 'float32'), ('vol', 'float32')])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(1.0, 2.0), (3.0, 4.0)], \n",
       "      dtype=[('mass', '<f4'), ('vol', '<f4')])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.view(my_dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里，我们使用 `dtype` 创造了自定义的结构类型，然后用自定义的结构来解释数组 `a` 所占的内存。\n",
    "\n",
    "这里 `f4` 表示四字节浮点数，`<` 表示小字节序。\n",
    "\n",
    "利用这个自定义的结构类型，我们可以这样初始化结构化数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(1.0, 1.0) (1.0, 2.0) (2.0, 1.0) (1.0, 3.0)]\n"
     ]
    }
   ],
   "source": [
    "my_data = np.array([(1,1), (1,2), (2,1), (1,3)], my_dtype)\n",
    "\n",
    "print my_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "第一个元素："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1.0, 1.0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "得到第一个元素的速度信息，可以使用域的名称来索引："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_data[0]['vol']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "得到所有的质量信息："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.,  1.,  2.,  1.], dtype=float32)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_data['mass']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "自定义排序规则，先按速度，再按质量："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(1.0, 1.0) (2.0, 1.0) (1.0, 2.0) (1.0, 3.0)]\n"
     ]
    }
   ],
   "source": [
    "my_data.sort(order=('vol', 'mass'))\n",
    "\n",
    "print my_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "回到最初的例子，定义一个人的结构类型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "person_dtype = np.dtype([('name', 'S10'), ('age', 'int'), ('weight', 'float')])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看类型所占字节数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "22"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "person_dtype.itemsize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "产生一个 3 x 4 共12人的空结构体数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "people = np.empty((3,4), person_dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分别赋值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "people['name'] = [['Brad', 'Jane', 'John', 'Fred'],\n",
    "                  ['Henry', 'George', 'Brain', 'Amy'],\n",
    "                  ['Ron', 'Susan', 'Jennife', 'Jill']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "people['age'] = [[33, 25, 47, 54],\n",
    "                 [29, 61, 32, 27],\n",
    "                 [19, 33, 18, 54]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "people['weight'] = [[135., 105., 255., 140.],\n",
    "                    [154., 202., 137., 187.],\n",
    "                    [188., 135., 88., 145.]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[('Brad', 33, 135.0) ('Jane', 25, 105.0) ('John', 47, 255.0)\n",
      "  ('Fred', 54, 140.0)]\n",
      " [('Henry', 29, 154.0) ('George', 61, 202.0) ('Brain', 32, 137.0)\n",
      "  ('Amy', 27, 187.0)]\n",
      " [('Ron', 19, 188.0) ('Susan', 33, 135.0) ('Jennife', 18, 88.0)\n",
      "  ('Jill', 54, 145.0)]]\n"
     ]
    }
   ],
   "source": [
    "print people"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('Jill', 54, 145.0)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people[-1,-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 从文本中读取结构化数组"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们有这样一个文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing people.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile people.txt\n",
    "name age weight\n",
    "amy 11 38.2\n",
    "john 10 40.3\n",
    "bill 12 21.2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "利用 `loadtxt` 指定数据类型，从这个文件中读取结构化数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([('amy', 11, 38.2), ('john', 10, 40.3), ('bill', 12, 21.2)], \n",
       "      dtype=[('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "person_dtype = np.dtype([('name', 'S10'), ('age', 'int'), ('weight', 'float')])\n",
    "\n",
    "people = np.loadtxt('people.txt', \n",
    "                    skiprows=1,\n",
    "                    dtype=person_dtype)\n",
    "\n",
    "people"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看 `name` 域："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['amy', 'john', 'bill'], \n",
       "      dtype='|S10')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "people['name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove('people.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于下面的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing wood.csv\n"
     ]
    }
   ],
   "source": [
    "%%writefile wood.csv\n",
    "item,material,number\n",
    "100,oak,33\n",
    "110,maple,14\n",
    "120,oak,7\n",
    "145,birch,3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "定义转换函数处理材料属性，使之对应一个整数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tree_to_int = dict(oak = 1,\n",
    "                   maple=2,\n",
    "                   birch=3)\n",
    "\n",
    "def convert(s):\n",
    "    return tree_to_int.get(s, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `genfromtxt` 载入数据，可以自动从第一行读入属性名称："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data = np.genfromtxt('wood.csv',\n",
    "                     delimiter=',', # 逗号分隔\n",
    "                     dtype=np.int, # 数据类型\n",
    "                     names=True,   # 从第一行读入域名\n",
    "                     converters={1:convert}\n",
    "                    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([(100, 1, 33), (110, 2, 14), (120, 1, 7), (145, 3, 3)], \n",
       "      dtype=[('item', '<i4'), ('material', '<i4'), ('number', '<i4')])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看域："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 1, 3])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['material']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "os.remove('wood.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 嵌套类型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有时候，结构数组中的域可能包含嵌套的结构，例如，在我们希望在二维平面上纪录一个质点的位置和质量：\n",
    "\n",
    "<table> \n",
    "  <th colspan=\"2\">position\n",
    "  <th rowspan=\"2\">mass\n",
    "  <tr><th>x<th>y\n",
    "</table>\n",
    "\n",
    "那么它的类型可以这样嵌套定义："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "particle_dtype = np.dtype([('position', [('x', 'float'), \n",
    "                                         ('y', 'float')]),\n",
    "                           ('mass', 'float')\n",
    "                          ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设数据文件如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting data.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile data.txt\n",
    "2.0 3.0 42.0\n",
    "2.1 4.3 32.5\n",
    "1.2 4.6 32.3\n",
    "4.5 -6.4 23.3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "读取数据："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = np.loadtxt('data.txt', dtype=particle_dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([((2.0, 3.0), 42.0), ((2.1, 4.3), 32.5), ((1.2, 4.6), 32.3),\n",
       "       ((4.5, -6.4), 23.3)], \n",
       "      dtype=[('position', [('x', '<f8'), ('y', '<f8')]), ('mass', '<f8')])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看位置的 `x` 轴："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 2. ,  2.1,  1.2,  4.5])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['position']['x']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除生成的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "os.remove('data.txt')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
